A Fast Approximation Scheme for Low-Dimensional k-Means
We consider the popular k-means problem in d-dimensional Euclidean space. Recently Friggstad, Rezapour, Salavatipour [FOCS’16] and Cohen-Addad, Klein, Mathieu [FOCS’16] showed that the standard local search algorithm yields a p1`εq-approximation in time pn ̈kq Opdq , giving the first polynomialtime approximation scheme for the problem in low-dimensional Euclidean space. While local search achieves optimal approximation guarantees, it is not competitive with the state-of-the-art heuristics such as the famous k-means++ and D-sampling algorithms. In this paper, we aim at bridging the gap between theory and practice by giving a p1`εq-approximation algorithm for low-dimensional k-means running in time n ̈k ̈plog nq q , and so matching the running time of the k-means++ and D-sampling heuristics up to polylogarithmic factors. We speed-up the local search approach by making a non-standard use of randomized dissections that allows to find the best local move efficiently using a quite simple dynamic program. We hope that our techniques could help design better local search heuristics for geometric problems.
منابع مشابه
Approximation of stochastic advection diffusion equations with finite difference scheme
In this paper, a high-order and conditionally stable stochastic difference scheme is proposed for the numerical solution of $rm Ithat{o}$ stochastic advection diffusion equation with one dimensional white noise process. We applied a finite difference approximation of fourth-order for discretizing space spatial derivative of this equation. The main properties of deterministic difference schemes,...
متن کاملTitle from Practice to Theory; Approximation Schemes for Clustering and Network Design
Abstract What are the performance guarantees of the algorithms used in practice for clustering and network design problems? We answer this question by showing that the standard local search algorithm returns a nearly-optimal solution for low-dimensional Euclidean instances of the traveling salesman problem, Steiner tree, k-median and k-means. The result also extends to the case of graphs exclud...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملA bad 2-dimensional instance for k-means++
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from among the given points. For i > 1, pick a point to be the i center with probability proportional to the square of the Euclidean dist...
متن کاملEfficient Approximation for Large-Scale Kernel Clustering Analysis
Kernel k-means is useful for performing clustering on nonlinearly separable data. The kernel k-means is hard to scale to large data due to the quadratic complexity. In this paper, we propose an approach which utilizes the low-dimensional feature approximation of the Gaussian kernel function to capitalize a fast linear k-means solver to perform the nonlinear kernel k-means. This approach takes a...
متن کامل